fix(opt): decouple slot-stack from inst_id in wasm_to_ir (#121)#122
fix(opt): decouple slot-stack from inst_id in wasm_to_ir (#121)#122avrabe wants to merge 2 commits into
Conversation
`OptimizerBridge::wasm_to_ir` overloaded `inst_id` as both the unique IR instruction id AND a vreg-slot index, with back-references like `inst_id.saturating_sub(2)` assuming a one-to-one correspondence with the wasm value stack. That assumption broke whenever any wasm op consumed a stack slot without producing one — Drop, LocalSet, GlobalSet, the i32/i64 store family, BrIf, and the structural Block/Loop/End markers. The next binary or unary op's back-reference would then index a stale or never-mapped vreg, and `get_arm_reg` would either trip the PR #101 defensive panic or (pre-PR-101) silently fall back to R0 — the silent-miscompilation class first surfaced in issue #93. Gale (the real-hardware test rig) caught WASM modules in the field that tripped this on production silicon; the cargo-fuzz `wasm_ops_lower_or_error` harness on PR #117 surfaced the same class six different ways (Nop/Unreachable/Return were closed there; Drop, LocalSet, Store, Block/Loop/End remained until this PR). Fix: introduce `slot_stack: Vec<u32>` in `wasm_to_ir` that mirrors the wasm value stack. Each producer pushes its dest vreg onto slot_stack; each consumer pops to discover its source vreg. `inst_id` reverts to its original meaning — a monotonically increasing unique IR id — and is no longer used for slot lookup. i64 values occupy two consecutive entries on slot_stack (lo first, then hi), matching the (dest_lo, dest_hi) two-vreg-pair layout already used by i64 opcodes. I64ExtendI32U/S aliases dest_lo to the consumed i32 src vreg by IR convention (preserved); I32WrapI64 aliases dest to src_lo (preserved). Drop becomes an explicit `slot_stack.pop(); continue` no-IR-emit arm; Nop/Unreachable/Return emit Opcode::Nop with no slot_stack effect. Drive-by: `i64_operand_count` was missing I64DivS/I64DivU/I64RemS/ I64RemU (so `analyze_i64_local_gets` failed to mark their i64 operands), which was masked by the same inst_id-slot scrambling. Added them; the existing i64-div WAST tests now exercise the i64 LocalGet path instead of fortuitously-correct i32 Loads. The catch-all `_ => Opcode::Nop` is preserved as a bug-finder: unknown ops do not touch slot_stack, so subsequent consumers fail loudly via `slot_stack.pop().expect(...)` instead of silently mis-binding vregs. Regression coverage: new `crates/synth-synthesis/tests/regression_issue_121_slot_stack.rs` exercises Drop/LocalSet/GlobalSet/Store/BrIf/Block-Loop-End/LocalTee between producer-and-consumer plus i32 and i64 variants. Two semantic-correctness probes confirm that Popcnt reads the surviving stack value (not the dropped one) — proving the fix addresses silent miscompilation, not just the panic. Test delta: +12 tests, 0 regressions. The 4 fuzz-related regression tests from #100/#101/#103/#104 plus the #93 memset/i64-shift tests all continue to pass. Refs: issue #121, PR #117 (fuzz-harness reproductions), issue #93 (silent-drop class), PR #101 (defensive panic), PR #100 (fuzz harness).
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
The slot_stack refactor in f0becb6 closes the silent-miscompilation class that Gale hit on silicon. But the .expect() panics inside the new slot_stack.pop() sites turn malformed-input cases (which the fuzz harness can construct) into a different panic class — same defensive "crash the compiler not the firmware" intent, but the harness contract is "lower or Err, never panic". This commit ports #117's pre-flight wasm_stack_check into the #122 branch so malformed inputs return Err *before* reaching the slot_stack pops. The two layers together give the full guarantee: Pre-flight (synth_core::wasm_stack_check::check_no_underflow) ↓ rejects malformed wasm before lowering — typed Err wasm_to_ir slot_stack model ↓ correct semantics for valid wasm — fixes Gale's silicon class ir_to_arm ↓ unmodified Includes: - `crates/synth-core/src/wasm_stack_check.rs` (365 lines, 16 unit tests) — copy from PR #117's branch. - Module declaration in `crates/synth-core/src/lib.rs`. - Pre-flight call at the top of `OptimizerBridge::optimize_full` (`optimizer_bridge.rs:1827`). - Pre-flight call at the top of `InstructionSelector::select_with_stack` (`instruction_selector.rs:3559`). - `.github/workflows/fuzz-smoke.yml` — ported from #117 with the seed_corpus replay step. `wasm_ops_lower_or_error` re-promoted to `gating: true` now that the slot_stack model + pre-flight together close the panic class. - `fuzz/seed_corpus/wasm_ops_lower_or_error/seed-pr122-i32sub-empty-stack` - `fuzz/seed_corpus/wasm_to_ir_roundtrip_op_coverage/seed-pr122-local-tee-empty-stack` — the two crash artifacts the harness found on the first #122 CI run; now seeded for deterministic replay. When PR #117 lands first on main, the rebase will deduplicate wasm_stack_check.rs cleanly (identical files). Local verification: - `cargo test -p synth-core wasm_stack` — 16/16 pass. - `cargo test -p synth-synthesis --test regression_issue_121_slot_stack` — 12/12 pass. - `cargo test --workspace --exclude synth-verify` — 0 regressions. Refs: issues #121 (Gale silicon), #117 (pre-flight origin).
|
Pushed 3e3e3b1 — wires #117's pre-flight
Also: Seeded the two crash artifacts from the first #122 CI run ( When #117 merges first, the rebase will deduplicate Local: 16/16 stack-check + 12/12 regression + 0 workspace regressions. |
Summary
Closes #121 — the root architectural fix that PR #117's five rounds of
continuepatches were skating around. The wasm_to_ir lowering now tracks producer vregs via an explicit `slot_stack: Vec` parallel to `inst_id`, instead of overloading `inst_id` as both the IR id and the wasm-stack-slot index.This is silicon-priority: the bug Gale reported in the field (wasm modules with Drop/LocalSet/Store mid-stream) is fixed here. PR #117's temporary demotion of `wasm_ops_lower_or_error` to `gating: false` is unaffected (the demote commit was on #117's branch, never reached main), but #117 may now rebase and revert its workflow change once this lands.
What was broken
`OptimizerBridge::wasm_to_ir` used a single `inst_id` counter as both the unique IR-instruction id AND a vreg-slot index. Binary/unary handlers used `inst_id.saturating_sub(N)` to look up operands, assuming a 1:1 correspondence with wasm value stack positions. That assumption broke whenever a wasm op consumed a slot without producing a vreg (Drop, LocalSet, GlobalSet, Stores, control-flow ops...). The result was either:
The non-optimized path (`select_with_stack`) was unaffected because it uses a real value stack.
What this PR does
Drive-by fix
`i64_operand_count` was missing the i64 div/rem variants (I64DivS/U, I64RemS/U). The old `inst_id.saturating_sub(4)` math happened to fortuitously work for the existing `i64_div.wast` test due to saturating-sub at slot boundaries; the slot_stack refactor unmasked this as a real `pop()` on an empty stack. Added the four ops to `i64_operand_count` to resolve cleanly.
Tests
`crates/synth-synthesis/tests/regression_issue_121_slot_stack.rs` — 12 new tests, all passing:
Panic-free shapes (the fuzz-found inputs):
Semantic correctness (proving the silent-miscompilation path is fixed, not just the panic path):
Full workspace test (excluding synth-verify — z3 network issue): 1041 passing, 0 regressions. The 4 existing AAPCS/i64 regression tests from PR #100/#101/#103/#104 plus the #93 memset/i64-shift tests all continue to pass.
Test plan
Refs
🤖 Generated with Claude Code